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We consider the convolution model where i.i.d. random variables 
Xi having unknown density / are observed with additive i.i.d. noise, 
independent of the X's. We assume that the density / belongs to 
either a Sobolev class or a class of supersmooth functions. The noise 
distribution is known and its characteristic function decays either 
polynomially or exponentially asymptotically. 

We consider the problem of goodness-of-fit testing in the convo- 
lution model. We prove upper bounds for the risk of a test statistic 
derived from a kernel estimator of the quadratic functional J f 2 based 
on indirect observations. When the unknown density is smoother 
enough than the noise density, we prove that this estimator is n~ 1//2 
consistent, asymptotically normal and efficient (for the variance we 
compute). Otherwise, we give nonparametric upper bounds for the 
risk of the same estimator. 

We give an approach unifying the proof of nonparametric mini- 
max lower bounds for both problems. We establish them for Sobolev 
densities and for supersmooth densities less smooth than exponential 
noise. In the two setups we obtain exact testing constants associated 
with the asymptotic minimax rates. 

1. Introduction. We consider the convolution model, 
(1) Yi = Xi + Ei, i = l,...,n, 

where the random variables Xi,Si are independent. We denote the common 
unknown density of Xi, i = 1, . . . , n, by /. Let <3?(u) = J e txu f{x) dx denote 
its characteristic function. We observe only the Yi, i = 1, . . . ,n. 

We consider the following nonparametric classes of density functions / : R — 
M + with J f = 1 and belonging to L2. A Sobolev class of density functions 
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with smoothness j3 > and radius L > is denned by 

(2) W(J3,L) = |/€ |$(u)| 2 |u| 2/3 du < 2vtl|. 

A class of supersmooth density functions for a,r,L > 0, constants, is de- 
fined by 

(3) S(a,r,L) = ^f£C°°,J |$(u)| 2 exp(2a|u| r ) < 2ttl|. 

Let the noise be i.i.d. with known probability density 5 and characteristic 
function $ ff . Then the resulting observations have common density p = f *g 
and characteristic function 3> p = • <3? 9 . We also consider noise having a 
nonnull Fourier transform, & 9 (u) 7^ 0, V u £ M. Typically two different be- 
haviors are distinguished in nonparametric estimation, polynomially smooth 
(or polynomial) noise 

(4) ~ |n|~ ,J , |k| ->■ 00, cr > 1, 

and exponentially smooth (or supersmooth or exponential) noise 

(5) \$ 9 (u)\ ~ exp(-j\u\ s ), \u\ — >• 00, 7,s > 0. 

The first problem considered in this paper is nonparametric minimax 
goodness-of-fit testing from noisy data; that is, for a given density /o in 
the smoothness class W(P,Lq), respectively, S(a,r,Lo) with Lq < L, decide 
whether 

Ho-f = fo, 

or 



H\(C, ip n ) : f is in the smoothness class / '(/ — /o) 2 > Cip, 



2 

n - 



from observations Y\, . . . , Y n , for some fixed C > and ^ n > 0. 

Many important applications of this problem can be found in biology, 
medicine and physics, where errors-in- variables models have been extensively 
used. 

In genomics, it is appropriate to admit that microarray data contain er- 
rors from non-biological sources. Gene expression is measured by scanning 
the fluorescence intensity of the microarray (see, e.g., Speed [29]). Software 
packages give slightly different results due to different correction and nor- 
malization methods. Testing the underlying fluorescence density from the 
scanned measurements provides a calibration method to the practitioner's 
particular microarray and scanner. 

In medicine, many measurements are known to be subject to additive 
errors. In particular, the National Health and Nutrition Examination Survey 
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(1976-1980), NHANES II, is a large dataset source of many studies of errors- 
in-variables models; see, for example, Carroll, Ruppert and Stefanski [7] 
for a previous study, NHANES I, and Delaigle and Gijbels [9]. The log- 
daily saturated fat intake is known to be a typical variable subject to error 
measurement and its probability density was estimated in the convolution 
model, with errors having either a Laplace or a Gaussian law. This variable 
is used to predict breast cancer, so the study is limited to women aged 
from 25 to 50. It was noted that the underlying density is symmetric, very 
smooth and has tails heavier than a normal distribution. Goodness-of-fit 
testing would help to choose between different types of densities. 

Another important application of our testing procedure is to mixing loca- 
tion families {g(- — 8)}q with unknown mixing probability density /. The ob- 
servation Y therefore has probability density p(y) = J g(y — 0)f(6) d9 = f*g, 
as in the convolution model. 

Moreover, we suggest use of this methodology for determining K, the 
unknown number of components in a finite mixture model. The astronomy 
dataset from Roeder and Wasserman [28], consisting of velocities (xlO -2 ) 
at which 82 galaxies from Corona Borealis spread away from our galaxy, 
was thoroughly studied in a .fT-mixture model with unknown K; see, for ex- 
ample, Stephens [31] and Richardson and Green [27]. Let 9±,...,9k be the 
unknown states with the finite mixing probabilities {p\, . . . ,pk}- In order to 
fit into our theoretical framework, we suggest replacing the finite probabil- 
ity by a continuous law having density fox = PkJ2k=i fo(- — Ok), with fo a 
peaked, supersmooth density. A preliminary estimation for different values 
of K = 1, . . . , K provides estimators for {9k}k=i,...,K and {Pk}k=i,...,K- Then 
use goodness-of-fit testing as described later to test Hq : f = foK iteratively 
for K = K, . . . , 1 until the null is accepted. 

All the previous examples, among many other applications, fit our setting 
for different values of parameters associated with the underlying density and 
the noise. These examples were treated from the point of view of estimating 
the deconvolution density, not that of the testing problem. To our knowledge 
this is the first time minimax testing is performed from data contaminated 
with errors. We give here simulation results showing very good testing prop- 
erties between densities of the same families. As expected, testing quality is 
improved as the noise distribution becomes less smooth and/or has smaller 
variance. The test statistic has amazing convergence quality. 

In the convolution model (1), the problem of nonparametric estimation 
of the deconvolution density / has been intensively studied over the past 
two decades. In this paper, in order to surpass difficulties of estimation we 
address different issues, principally the goodness-of-fit test from noisy data 
in the L2 norm. 
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Definition 1. For a given < £ < 1, a test statistic A* is said to attain 
the testing rate ip n over the smoothness class if there exists C* > such that 

(6) limsup(p /o [A* = l] + sup P f [A* n = 0]X<Z 

for all C > C* . The rate ip n is called the minimax rate of testing, if there 
exists > and 



(7) 



l ^ n A P fol A n = 1] + SU P P A A n = 0] j > £ 



for all < C < C*, where the inf is taken over all test procedures A n . 

Moreover, if C* = C* we call ipn the exact (or sharp) minimax rate of 
testing. 



We recall that the usual procedure is to construct the test statistic A* 
such that (6) holds, also called the upper bound of the testing rate, and then 
prove the minimax optimality of this procedure, that is, the lower bounds 
in (7). If the test procedure does not depend on the smoothness of the 
unknown functions (which may vary in some interval), it is called adaptive 
to the smoothness and ip n is the minimax adaptive rate. 

Minimax and adaptive theory of testing has been extensively developed 
in density, regression and Gaussian white noise models when direct observa- 
tions are available. For nonparametric minimax rates in goodness-of-fit test- 
ing in different setups we refer to Ingster [18], Ermakov [11] and references 
therein. Exact minimax rates have been found; see, for example, Lepski and 
Tsybakov [22] for the regression model with pointwise and sup-norm dis- 
tances. The first adaptive rates were given by Spokoiny [30]. Exact minimax 
rates of testing for supersmooth functions are known only in the case r = 1 
and for the Gaussian white noise model (see Pouet [26]) with pointwise and 
sup-norm distances. A further development consists of a goodness-of-fit test 
for a parametric composite null hypothesis and adaptive to the smoothness 
as in Fromont and Laurent [14] and Gayraud and Pouet [15]. Goodness-of- 
fit tests can be based on the distribution function rather than the density 
function of our data. In view of results by Fan [12] the re -1 / 2 rates are still 
not feasible when estimating the distribution function in the convolution 
model. In view of numerous practical applications of testing, we expect the 
same problem in the context of data contaminated with errors to find similar 
extensive use in applied problems. 

Here, the goodness-of-fit problem is considered in quadratic norm, (/(/ — 
Jo) 2 ) 1 / 2 . As we can expect, the testing problem is easier than deconvolution 
density estimation, that is, the testing rates are faster as they appear in 
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Table 2. Note that minimax L2 testing can be performed at nearly the para- 
metric rate (logn)( CT+1//4 - )//r n -1 / 2 for supersmooth densities and polynomial 
noise. 

We actually give exact minimax rates of testing in setups with densities 
less regular than the noise: Sobolev densities and exponential noise, su- 
persmooth densities less smooth than the corresponding exponential noise 
(r < s). 

The natural test statistic in this context is an estimator of /(/ — /o) 2 , 
where fo is given, from noisy data. Therefore, the second important problem 
treated in this paper is the estimation of the quadratic functional d := f f 2 , 
where / is the density in the convolution model (1). 

Definition 2. An estimator d n of d is said to attain the rate ip n over the 
smoothness class W(/3,L), respectively, S(a,r,L), if there exists a constant 
C > such that 

(8) lim sup sup (p~ Ef[\d n — d\] < C, 

n— >oo / 

and this rate is called minimax if no other estimator attains better rates 
uniformly over the class 

(9) liminf inf sup ip^ 1 E f[\d n — d\]> c 

n ^°° dn f 

for some c> 0, depending on fixed known parameters, where the supremum 
is taken over all densities in the smoothness class and the infimum over all 
estimators d n . 

In some cases rt _1 ' 2 -consistent estimators of d exist and we prove the 
asymptotic efficiency Cramer-Rao bound for such estimators (also called 
efficient estimators). 

Definition 3. An estimator d n of d = f f 2 is asymptotically normally 
distributed with asymptotic variance W = W(/) if 

^(d n -d)^N(0,W(f)). 

Moreover, it attains the asymptotic efficiency Cramer-Rao bound if for any 
fo in the Sobolev class W(f3,L), respectively, in S(a,r,L), and a family of 
shrinking neighborhoods V(/o) of /o, 

inf liminf sup nE f [(d n - df] > W(f ) 
V(/o) / G V(/o) 



for any other estimator d n of d. 
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When direct observations are available, it is well established that para- 
metric rates can be achieved for smooth enough densities belonging, for 
example, to the Holder class. Lower bounds for slower rates were found by 
Bickel and Ritov [1] for smoothness values less than 1/4. In this context, 
Laurent [20] gave efficient estimation at the parametric rate and Birge and 
Massart [2] proved nonparametric lower bounds for estimating more gen- 
eral quadratic functionals. The study of general functionals was completed 
by Kerkyacharian and Picard [19] for minimax rates and Tribouley [32] for 
adaptive estimation. Nemirovski [25] gave asymptotically efficient estimators 
of less smooth functionals, one or two times continuously differentiable. 

In this paper, we give minimax results for setups in the nonparametric 
"regime" and efficiency constant in the sense of the theory of Ibragimov and 
Khas'minskii [17] and Levit [23] for asymptotically normal, n~ 1//2 -consistent 
estimators (see Table 1). 

Moreover, it is possible to generalize these results to models with par- 
tially known noise distribution. Following results by Butucea and Matias [5], 
we can consider noise distributions with unknown scaling parameter (some 
more assumptions are needed in order to insure identifiability in the model) . 
Current work is addressing the question of finding test procedures that will 
require even less information about the noise distribution. 

These procedures can also be made adaptive, that is, free of the smooth- 
ness parameters, in some setups. We conjecture a loss of y/logn due to 
adaptation to (3 for estimating d (see Efromovich and Low [10]), respec- 
tively, \/log log n for testing in the setup of Sobolev classes and polynomial 
noise. On the contrary, the testing procedure can be made fully data depen- 
dent with no loss in the rate for Sobolev densities and exponential noise and 
we expect the same to happen for estimating d. For supersmooth densities, 
computing the loss for adaptation is still an open problem. 

The structure of the paper is as follows. In Section 2 we introduce the 
estimator d n of / f 2 and the test statistic A* and give some simulation re- 
sults. In Section 3 we indicate the choice of the bandwidth in a functional's 
estimator in order to prove either upper bounds in the minimax sense, or 
asymptotic normality and efficiency, according to different setups. In Sec- 
tion 4 we deal with the goodness-of-fit testing problem and, for each setup, 
we compute upper bounds for testing rates. Finally, in Section 5 we describe 
the approach unifying the proofs of minimax nonparametric lower bounds 
from Sections 3 and 4 and prove them for nonparametric setups of Sobolev 
classes of densities and polynomial, respectively, exponential noise, and for 
the bias dominated setup of supersmooth densities less smooth than expo- 
nentially smooth noise (r < s). We have provided detailed proofs for one 
setup (Sobolev densities and polynomial noise) and put all other proofs in 
the Appendix that the interested reader may find in a longer version of this 
paper [4]. 
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2. Methodology and numerical results. In the described model, we con- 
sider the problem of estimating d = J / 2 , from available observations (Xi)i=i,...,n, 
where the density / of observations (JQ)t=i n is unknown. Let us denote 
the deconvolution kernel K n defined via its Fourier transform as 

(10) $^( u ) = (&^j\\ $ K (u), 

where K(x) = sm(x) / (irx) is such that 0> K (u) = I\\ u \<i] an d the bandwidth 
h = h n —>0 when n — ► oo will be specified later. 
Define d n , a bias-reduced estimator of d, by 



(11) d n = 1 V I K n h (x -Y k )K n h (x -Yj)dx, 

n(n — 1) , , 



where K njh (-) = l/hK n (-/h). 

In the sequel, we denote the L2 scalar product of two functions M and 
N by (M,N) = J M(x)N(x) dx and the complex conjugate of iV by N. 

In direct models, such a kernel based estimator can be found in Hall and 
Marron [16]. A biased-reduced kernel estimator first appeared in Bickel and 
Ritov [1], who proved that it is efficient for Holder type smoothness values 
greater than 1/4. Projection estimators were defined in Fan [13], Efromovich 
and Low [10] and Laurent [20]. 

Let us construct a test statistic from noisy data. It is natural to suggest 
as a test statistic \T*\ the optimal estimator of the quadratic functional 

11/ -/oil!, 

T n = 7 \ EM' " Y k ) - /„, K n>h (- - Yj) - /o), 

where h \ with n and K n is defined in (10). 
Define the test procedure 

(12) A* n = I[\T:\>C*t 2 n ] 

for a constant C* > and some threshold t n > depending on the setup. 

In this paper, we chose the sine kernel K, which has optimality properties. 
We stress the fact that for numerical implementation better choices are 
available, as was discussed in Butucea and Tsybakov [6]. Indeed, truncation 
of the Fourier transform gives a kernel K n which has J|i^n| = 00. It is 
enough to smooth $> K into a continuous trapezoidal-shaped function to get 
an absolutely integrable kernel. We actually use 

= I[\ u \ < 1] + exp(l - (|u| (2 - |n|))" 2 ) • J[l < |u| < 2], 

an infinitely differentiable function with compact support. The resulting 
deconvolution kernel has as many finite moments as g, the density of the 
noise, and the same optimality properties as our kernel K n . 
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We consider N = 100 samples of size n = 500 and estimate the first type 
of error and the power of our testing procedure, as well as the mean squared 
error of our test statistic for estimating ||/ — /o|||- 

The noise distribution will be either ordinary smooth Laplace(l) x S + M 
having density g(x) = 0.5/5" exp(—|x — M\/{2S)) and characteristic function 
$9 (it) =e iuM (l + (Su) 2 y l , of order s = 2, or Laplace(3) x S + M obtained 
as the sum of three independent rescaled Laplace(l) variables, of order s = 6. 

Densities /o under the null hypothesis are either Gaussian N(M,S), 
belonging to a class of supersmooth functions S(a,r,L) with r = 2 and 
a < S 2 /2, or the Laplace density Laplace(10) x S + M having character- 
istic function <J> = (1 + (Su) 2 /(10))~ 10 belonging to a Sobolev class with 
[3 < 20 — 1/2. Other examples of densities can be found in Comte, Rozen- 
holc and Taupin [8], including Gamma, \ 2 > stable distributions, densities 
with compactly supported characteristic functions and their mixtures. 

We tested fa: -^(1)1) against successively rescaled Gaussian laws iV(l, 1 + 
(i— 1) x 0.25), i = 1, ... ,8, under both Laplace(l) and Laplace{2>) errors. We 
also tested fa: Laplace(10) x 2 against rescaled Laplace(10) x (2 + (i — 1) x 
0.25) and f : N(l, 1) against shifted N(l + (i - 1) X 0.25, 1), for i = 1, . . . , 8, 
under Laplace(l) x V0.5 errors. 

We get excellent estimated test power, rapidly increasing with i = 1, . . . , 8. 
We note that the power of the tests improves with the smoothness of tested 
densities, but it degrades with the smoothness of errors (when the signal 
to noise ratio is constant). These tests benefit from remarkable estimation 
properties of the test statistic T* , as we can see from the boxplots in Figure 1 . 

We also note that the results are very satisfactory for detecting a one- 
mode density against a mixture of two identical densities. On the contrary, 
it is difficult to detect a heavier tailed density than /o when all other pa- 
rameters are identical. This is due to the choice of the L2 norm, and this 
drawback is known in the testing literature. It would therefore be interesting 
and it is still an open problem to design tests with different distances QL^, 
Kullback or x 2 distance in the alternative) in this model. 

3. Estimation of J / 2 in the convolution model. In this section we present 
convergence properties of d n in (11) together with corresponding optimal 
choice of tuning parameters in each setup. Rates are summed up in Table 1. 

Definition 4. Let d n in (11) be the estimator of d with bandwidth 
h > 0. We call the bias and the variance of this estimator, respectively, 



B{d n ) = \Ef\d n ]-d\ and V(d n ) = E f [\d n - E f [d n }\ 2 }. 
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Fig. 1. Mean square error of T„ for estimating \\f — /o||!j f or fo-' iV(l,l) and f: 
N(l, 1 + (i — 1) X 0.25) uiit/i Laplace(l) and Laplace(3) errors, in the upper graphics; 
for f : Laplace(lO) x 2 and f: Laplace(W) * (2 + (i — 1) X 0.5) and / () : iV(l, 1) and f: 
N(l + (i — 1) x 0.25, 1) if)zf/i Laplace(l) x V 0.5 errors in the lower graphics, i = 1, . . . ,8. 
The scaling is either xlO -3 or xlO -4 , as indicated above the panels. 



3.1. Sobolev densities and polynomial noise. We study in detail the case 
where the underlying density/ belongs to a Sobolev class W(f3,L), with 
(3, L > 0, defined in (2), and the noise is polynomial as defined in (4). 

Proposition 1. For any density function f in the Sobolev class W(f3, L), 
the estimator d n in (11) with bandwidth h> 0, h— > asn->oo is such that 

B(d n ) < Lh 2 ?, 

xt/ t \ 2||p||l (l + e n ) 40|Cf) g n 

where £l g (f) > is defined later in (13) and the sequences e n and E n do 
not depend on f but depend only on j3, L and the noise density g, such that 
e„ — > as n — > oo , and £"„ bounded. 
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In order to define f2 g (/), let us note that for any / in the Sobolev class 
W(f3,L) and g a noise density satisfying (4) with /? > cr, we have &/&9 a 
continuous function which is absolutely and quadratically integrable (see 
Lemma 4). Then we can define the function 

yy> 2itJ $9(u) 

which is real-valued and uniformly continuous, but not necessarily a density 
function. It is known (see Lukacs [24]) that if both characteristic functions 
<E> and <3? 5 are analytic around 0, then their quotient cannot be the char- 
acteristic function of any distribution function. Nevertheless, this function 
is bounded and its L 2 norm is uniformly bounded over densities / in the 
Sobolev class by M F depending only on j3, L and the fixed given density g. 
Let 

«?(/) = / F 2 (y)p(y)dy-(J S\x)d^j 

(13) = E f [F 2 (Y)]-(E f [F(Y)]) 2 . 

Indeed, / f{xf dx = (2^)" 1 ($,¥) = (2-k)~ 1 ) = (p,F) = E f [F(Y)}, 

which is therefore a real number. 

Remark 1. Note that (13) says that 4f^(/) = AV{F(Y)). This is heuris- 
tically similar to the results by given Laurent [20]. She estimates / f 2 from 
direct observations and obtains the efficiency constant AV(f(X)) = 4 J / 3 — 
4(J / 2 ) 2 when (3> 1/4. In Theorems 1 and 2 we describe the same change 
of "regime" when (3 > a + 1/4, respectively, (3 < a + 1/4. Similarities be- 
tween deconvolution with <r-polynomial noise and the derivative of order 
a have been noticed before. Indeed, we actually estimate J f 2 = Ef[F(Y)] 
here, where F *g = f, whenever the function F exists, and F is as difficult 
to estimate as the cr-derivative of the function /. 

Proof of Proposition 1. Let us note that 

E f [d n ] = E f [{K n>h (- - Y^K^i- - Y 2 ))] 

(14) =\\K n , h *p\\ 2 2 = \\K h *f\\ 2 

<& K {hu)\<S>{u)\ 2 du. 



1 

~ 2^ 

By the Plancherel formula and equation (14) 

1 



B(d n ) ^ 



J {$ K (hu) - l)\<S>{u)\ 2 du 

<— f (h\u\) 2f3 mu)\ 2 du<Lh 213 . 
2vr J\ u \>i/h 
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As for the variance let us first write 



d n - Ef[dn] 



1 



n(n — 1) 



- J2( K nM- ~ Y k) -K h *f, K n , h {- - Yj) -K h *f) 



+ - E ( K nM- ~ Y k) -K h *f,K h *f) = S 1 + S 2 , 



say. 



k=l 



The variables in S\ and in S 2 are uncorrelated and all of them are centered. 
Thus, V{d n ) = E f [S 2 } + E f [Si\. We have 

Ef[S * ] = n(n-l) {Ef[{Kn ' h{ ' ~ Yl) ~ K h*f' K n,h(- ~ Y 2) ~ K h *f) 2 ]) 

- 2E f [(K nth (--Y 1 ),K h *f} 2 } + \\K h * /|||). 

Moreover, E f [Sl\ = 4 n - 1 (E f [(K n>h (- -Y^K^ff] - \\K h *f\\$). Similarly 
to Butucea [3], we have 

E f [(K nih {- -Y 1 ),K n>h (- -Y 2 )) 2 ] 



K n [z + 



Mr, 



v — u 



v — u 

h 

2 



K n {z) dz 



p{u)p{v) du dv 



h 



p(u)p(v) du dv = T, 



say, 



where M n (x) = J K n (z + x)K n (z) dz. Next, use the facts that p is at least 
(/3 + a — 1 /2)-Lipschitz continuous and uniformly bounded (Lemma 3 in the 
Appendix [4]) to find C and M Y , positive constants depending only on 0,L 
and a, such that 



r-^bll!l|M n ||! 



i 

< - 

- h 



< 



hx\<e 



M n (x)\ 2 \p(v + hx) — p(v)\ dxp(v) dv 
M n (x)\ 2 Ce p+(J - l/2 dx 



+ 



2M Y \M n (x)\ 2 dx )p(v)dv 



x\>e/h 



<^(||M n |||), 

where o(l) — ► as h —> 0, depending only on (3, L and the density g. We 
choose e — > such that e/h — > oo so that 

IIt)II 2 IIM II 2 

(15) r = ™p^ (i + (i)). 



12 



C. BUTUCEA 



By the Plancherel formula, \\M n \\2 = J \ <S> Kn (u)® Kn (-u)\ 2 du = (ir(Aa + l) X 
/i 4o ")" 1 (l + o(l)). Note that we should again split the integration domain 
and evaluate the dominant term in the previous integral. 
On the other hand, let us deal now with 



(16) 



E f [(K nth (--Y 1 ),K h *f) 2 ]=E J 



U, 



1 



2tt 



<£(«) du 



say. 



In Lemma 4 in the Appendix [4] we prove that &/§ 9 is absolutely integrable 
if /3 > a. Then by the Lebesgue convergence theorem we see that there exists 
a function 



iuy 



*(«) 

i>s(it) 



lim — 



\u\<l/h 



$9 (u) 



which is uniformly continuous and bounded such that || F \\ \ = 1 1 ^/'l' 9 1 1 2 / (2vr) . 
Note that $(u) = $(-u) and <E> 9 (u) = $5(— u), giving F = F. Thus, F is a 
real-valued function. 
Thus, we obtain 



(17) 

Finally, 

(18) 



U = E f [F 2 (Y)](l + o(l)). 



\Kh*f\\i = \\f\\i(l + o(l)) 



by the bias computations. 

Thus, from (15), (17) and (18) we get 



Ef[S\ 



(19) 



vr(4a + 1) n 2 h 4a+1 

AE f [F 2 (Y))(l + o{l)) , 2||/|||(1 + o(l)) 



+ 



11" 



11- 



l + o(l) 



vr(4a + 1) rfih^ 1 ' 
Use (17) and (18) to get 



(20) Ef[S 2 ] = —(Ef[F 2 (Y)] 



n 



4 )(1 + o(1)) = ^^ (1 + o(1)) . 



The upper bound for the variance follows from (19) and (20) for the case 
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For the case (3 < a, go back to (16): 



1 



2tt 



iuYi 



< 



2vr 



<$>9{u) 



\u\<l/h 



<&(u) du 



du 



i{u) 



(21) 



<—(o(l) + / |u| ff |$(u)|du 

27T V JM<\u\<l/h 

<0(1) f 

J M<\u\<l/l 



/!< 

2 ^du- 



1 2/3 1 



M<|w|<l/h 



$(u)| 2 du 



< 



Oil) 



h 2(a-3) + l ■ 

So, from (19) and (21) we obtain the upper bound for the variance when 
[3<<j. □ 

An easy consequence of Proposition 1 is that if the underlying unknown 
density is smoother enough than the noise [j3 > a + 1/4) our parameter can 
be estimated at the parametric rate. We establish next asymptotic normality 
and a Cramer-Rao type of asymptotic efficiency bound. 

Theorem 1. If f3> a+ 1/4, the estimator d n defined in (11) with band- 
width h = h* such that 

is an asymptotically normally distributed estimator of d, that is, 

V^(d n -d)SN(0,m 2 g (f)). 
Moreover, it is asymptotically efficient, attaining the Cramer-Rao bound. 

Proof. Let us decompose the risk of the estimator as 

E f [\d n - d\] < B(d n ) + Jv(dn) < Lh 2 ? + + o(l)), 



n 



and then use Proposition 1. Indeed, if j3 > a + 1/4 and if n 1 h ' 4ff+1 ' C 1 we 
see that 4Q? g (f) /n(l + o(l)) is the dominant term in the variance. Let us take 

h = o(n _1// ( 4 ^)) such that the bias is infinitely smaller, Lh 2 @ -C 2Q g (f)/n. So 

y/n(d n - d) = y/n{d n - E f [d n }) + y/nB(d n ). 

The second term of the sum on the right-hand side term tends to and the 
asymptotic normality of the first term can be deduced from Butucea [3] . It 
is in this case a classical central limit theorem for [/-statistics of order 1. 
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For the Cramer-Rao bound, we follow the lines of proof in Laurent [20]. 
Similar results were given by Bickel and Ritov [1] following the theory of 
Ibragimov and Khas'minskii [17] and Levit [23]. A first step of the proof 
is to compute the Frechet derivative of the functional J f 2 = J F ■ p at the 
likelihood p$ = fo* g, 

J F-p- J F -p = J 2F (p-po)+ J (F-F )(p-po) 

and / (F — Fo)(p — po) = o(\\p — P0W2), when \\p— P0II2 0. Next, consider the 
space orthogonal to the square root of the likelihood y/po, H = {k: J k^/po = 
0} and the projection operator onto this space: PH(p )(k) = k — (/ ky/po)y/po. 
Write K n = K = T'{po)y/po = -fW(p )(^) as (#> Then the minimal variance 

is bill- 

Here, T'(p )k = J 2F k; then 

K = j 2F y/p~ Q ( K k- (J ky/pijy/p^ 
= J {2F ^)k-(j 2F oR) ) j 



p k. 



So, finally, 



gf 2 = 4 J \F \ 2 P0 - (j2F p o y = 4V f0 (F (Y)). D 



In the following theorem we compute the rate on the nonparametric side 
(0 < (3 < a + 1/4). We prove in Section 4 that this rate is optimal in the 
minimax approach under the following additional assumption on the noise 
distribution. 

Assumption (P). The distribution of the polynomial noise in (4) is 
such that $ ff is at least three times continuously differ entiable. Moreover 
there exist A\, A2 > 1, uo, u\, U2 > large enough such that 

\<f> 9 {u)\>u V|n|<,4i 

and 

l(^) (fc) («)l< I^Hf far A; = 1,2, V|u| > A 2 . 

Theorem 2. // < /3 < a + 1/4, the estimator d n of d defined in (11) 
with bandwidth h* satisfies the upper bound (8) for the rate <p n , where 

= n -2/(4/3+4 CT +l) 9 ^ = n -4/3/(4/3+4 CT+ l) _ 

Moreover, under Assumption (P) this rate is minimax. 
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Proof of (8) for Theorem 2. If < [3 < a + 1/4, ||p||l/( 7r ( 4cr + 
l)n 2 h^ a+1 ) is the dominant term in the variance, whether [5>a or < a. 
The bandwidth h* minimizes the bias plus the variance. The upper bound 
of the normalized mean error is less than C = max{L, ^j2M p / '(7r(4<7 + 1)}; 
see Lemma 3 in the Appendix [4]. □ 

3.2. The other setups. In the case where the densities are smoother than 
the noise, we can always define the function F as the inverse Fourier trans- 
form of &/<&9. The next theorem gives us the bandwidth h* so that d n is an 
asymptotically normal and efficient estimator. 

Theorem 3. The estimator d n defined in (11) with bandwidth /i* such 
that 

,'lognN ~ 1 ' r 



4a 

is asymptotically normally distributed and it is asymptotically efficient, at- 
taining the Cramer-Rao bound AQ 2 (see Definition 3): 

(1) if f belongs to S(a,r,L) and the noise is a -polynomially smooth; 

(2) if f belongs to S(a,r,L) and the noise is exponentially smooth with 
r > s or with r = s and a > 7. 

In the case where the noise is exponentially smooth and smoother than the 
underlying density estimation is always difficult, that is, only nonparametric 
slower rates are attained. We prove the lower bounds (9), under the following 
additional assumption, which is not very restrictive. 

Assumption (E). The exponential noise distribution in (5) has a con- 
tinuously differentiable Fourier transform such that 

< 0(l)|n|" 4 exp(- 7 |n| s ), 

for large enough |n| and some fixed constant A € R. 

Theorem 4. Let the noise be exponentially smooth. The estimator d n 
of d defined in (11) with bandwidth h* satisfies the upper bound (8) for the 
rate (p n , where: 

(1) /// belongs to W(f3,L), 

/logn 2/3 + 1, lograV 1/s T { \ogn\ I s 

h * = -7; ^ lo £ "r; — > i Pn = L\ 



27 2 7 s 27 / r V 27 

moreover, under Assumption (E) this rate is minimax. 
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Table 1 

Rates for estimating d = J f 2 from indirect observations 



f\g 




Polynomial: \u\ " 


Exponential: exp( — y\u 


I s ) 


W(/3,L) 


P< 
0> 


a + 1/4 (l) n ^/W+^+i) 
cr + 1/4 2fl s n- 1/2 


0(l)(logn/(2 7 ))- 2 ' 3 / s 




S(a,r,L) 




2n g n~ 1/2 


r < s 0(1) exp(- 
(r > s) 

or 2Q. g n~~ 
(r = s,a > 7) 


-2a/ hi) 
-1/2 



Note, h* is the solution of (22). 



(2) If f belongs to S(a,r,L) and, either r < s or r = s and a < 7, h* is 
the solution of 

2a 27 2 
(22) — + — = log n - (log log n) 

and (f n = Lexp(— 2a /h^); moreover, under Assumption (E) this rate is 
minimax when r < s. 



Note that when the density and the noise are both exponentially smooth, 
the rates are faster than any logarithm but slower than any polynomial 
in n; except when r = s and = 7, the rate is nearly parametric, ip n = 
C3(logn) T 'I 2 1 ' yfn for h the solution of h r ~ l exp(4a//i r ) = en. 

4. Goodness-of-fit tests. Let us give here the convergence rates for the 
testing procedure in (12) and optimal choice of tuning parameters. The 
rates are given in Table 2. Note that for setups where we prove the lower 
bounds for the testing rate we need to assume that the density /o in the null 
hypothesis is such that 

(23) / (x) > — ^— Vx £ M. 

1 + \x\ A 

Let us note immediately that we have a similar property for Po = fo * 9- 
Indeed, let A > 1 be large enough such that f^ A g(x) dx > 1/2. Then there 
is a Cq > such that 

(24) Po (x)> J*J ( x -y) g (y)dy><%mm{^,^ Vx £ R. 

We choose to work under the assumption (23) for simplicity. Notice that we 
can as well solve the problem if fo decays asymptotically like a polynomial 
(faster than l/|x| 2 ), but for technical reasons we would need to assume 
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Table 2 






Rates for testing in h2-norm from 


indirect observations 


f\g 


Polynomial: |w| _<T 


Exponential: exp( — 7|w| s ) 


W(/3,L) 


0(1)n -2/3/(4/3 + 4 CT + l) 


v / I(logn/(2 7 ))-' 3 / s 


S{a,r,L) 


0(l)(logn) (CT+1 '' 4) /'n- 1 / 2 


r<s y/Lexp(—a/hl) 
r>s {l)^^eM^) 



Note, h* is defined in (22). 



that the characteristic function of the noise is smoother than C . Another 
way of proving the lower bounds consists of assuming (24), which is less 
restrictive, but then we have to modify the construction of perturbation 
functions according to the actual asymptotic behavior of /o- 

4.1. Sobolev densities and polynomial noise. Though two rates were at- 
tainable in the same setup for estimating d, only one minimax rate for 
testing is possible. This phenomenon is similar to the case of testing with 
direct observations. 

Theorem 5. The test procedure A* defined in (12) for the threshold t n 
attains the rate ip n and, under Assumption (P) and (23), ip n is a minimax 
rate of testing over the class W(/3,L), where 

h = K = n -2/(4/?+4<x+l) ) tn = i> n = n -2/3/(4/3+4 CT+ l) 

PROOF of (6) for Theorem 5. Let us bound from above successively 
the first and second type errors. Note that, for a fixed density /o G W(/3, L), 

E h [T*\ = pf & */o - /o||! = Lh 2 Po(l), 

similarly to the proof of Proposition 1. In order to compute the variance let 
us write 

T: - E fo [T*] = 1 ( R nM- - Y k) ~K h * /o, K n , h {- -Y 3 )-K h * /„) 

2 n 

Note that the previous sum is null, since for all k = 1, . . . ,n, 
(K n>h (--Y k )-K h *f ,K h *f ) 

= J_ / ^uY k ,§9( hu \ _ $ ( u ))$^ (hu)$ (u) du 

2ir J 

= {K nA (--Y k )-K h *f ,fo). 
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Finally, V fo [T*\ = S\\po\\ 2 2 n~ 2 h~^ +1 \l + o(l)), where S = 2/(7r(4a + 1)). So 
the first type error can be written as 

p*m > < < o(i)%^ < | 

for C* large enough. For the second type error, consider a density / in 
Hi(C,ip n ). Then Ef[T*] = \\Kh * f — /bill- The bias can be bounded from 
above as 

B[T n *] = |||^^/-/o|||-||/-/o|||| 

= \\\K h *f\\l-\\f\\l-2{K h *f-f,f )\ 

|<D(n)| 2 dn + ^- f Mu)\-\Mu)\du 

*T J\u\>l/h M J\u\>l/h 

<Lh 2ft (l + o{l)), 

since f^^i/h \u\ 2 ^\^o(u)\ 2 du = o(l), for the fixed density /o- In order to 
evaluate the variance, let us write 

T n * - E f [T*\ = -J— £ <^,fc(- " Y h ) -K h *f, K n , h {- -Y$ - K h * f) 
2 n 

+ -E <*n,fc(- -Y k )-K h *f,f- f ) 

= Si(/) + S 3 (/-/o), 

say. As in Proposition 1, the last two terms are uncorrelated, so V7[T*] = 
Ef[\S±(f)\ 2 ] + Ef[\S2(f — fo)\ ]■ Similar computation leads for h = h*to the 
upper bound 

where £l g (f — fo) = J(F — Fq) 2 p —(/(/ — fo)f) 2 and we have used constant 
M p > 0, such that supj ||p||oo < introduced in Lemma 3 in the Appendix 
[4]- 

Let us note that whenever (3 > a, we find M > large enough such that 



nj(/-yb)< j {F - F,) 2 p<NP\\F - F4 2 <^ j 
< [ Cl M 2a \^(u)-^ (u)\ 2 du 

J\u\<M 

+ / c 2 |-u| 2fT |^(n) -$ (n)| 2 du 
Jlul>M 



•<«>-•»<«> 



< c 3 M 
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/0II2 + 



M 2 ^— ) 7| U |>M 



-u| 2/3 |$(u)| 2 c^ 



<C||/-/o l|2 - 2<T//3 



/ M 2 



where C is a constant depending only on (3, L and the fixed noise probability 
density g. This inequality is useful for the limit cases in Hi where ||/ — 
/0II2 - > 0. So, the second type error can be bounded as 

P f [\T*\ < C*t 2 n ] < P f [\T* n - E f [T*}\ > ||/ - /oil 2 - Ct\ - B[T* n \\ 
\K-E f [T*)\ 



<P 



Vf[T*] 



> 



||/-/o||i-C> 2 -L^ 



d(n^+V2)-i + C2 ||/ - /oll^VVa/^ > a ) 

Either 0< /3 < <r + 1/4, when the probability above is less than c\{C — C* 
L)~ 2 < £/2 for C > C* large enough, or > a + 1/4, when 



P f \\T:\<C*tl]<P-. 



o(l) 



7 

< n -(2/3+2<7+l)/(4/3+4a+l) 



for C > C* large enough. 

The upper bounds in (6) are proved. For the lower bounds in (7) see 
Section 5. □ 

4.2. The other setups. We know now that in some setups we can estimate 
d at the parametric n~ l l 2 rate. We shall see next that the minimax testing 
rate is necessarily nonparametric. 

Theorem 6. The test procedure A* defined in (12) for the bandwidth 
/i*, the threshold t n and the constant C* satisfies the upper bound (6) for the 
rate ijj n , where: 



(1) If f belongs to S(a,r,L) and the noise is polynomially smooth, 
h = h* 



t 



logn 2(7 + 1/2, \ -1 / r 

log log n 

2a 2ar 

1 /lognV 4,J+1)/(4r) 



n — Yn 
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(2) If f belongs to W(f3,L) and the noise is exponentially smooth, 



moreover, under Assumption (E) ; ip n is an exact minimax rate of testing. 

(3) If f belongs to S(a,r,L) and the noise is exponentially smooth, h = 
h* is a solution of (22) and 



moreover, under Assumption (E), tp n is an exact (C* = C* = I) minimax 
rate of testing for the case r < s. 

We prove in the Appendix [4] exact lower bounds for the case r < s, but 
the same proof provides lower bounds precisely within a logarithmic factor 
for the case r > s. 

5. Lower bounds. We show in the first part that proofs for minimax 
lower bounds for the estimation problem of d and for the testing problem 
in L2 come down to the same choice of hypotheses and to checking similar 
conditions. 

Lemma 1. Let /o and f\ be two probability densities in the class W(f3, L), 
depending on n. If: 

(a) for estimation densities are such that | ||| — H/0II2I — 2fn, for some 
y? n > 0; 

(a') for testing densities are such that \\fi — /0II2 ^ Ctp n , for some ip n > 0; 

(b) PY "C Pq and there exists < 7/ < 1 such that 






if r < s 



if r > s; 




then 



inf sup ^EfUdn-dl] >(1-T7)(1 



d„ f£W(/3,L) 



inf sup (P Ho (A n = l) + P ifl(C) ^ ) (A re = 0))>(l-r ? )(l 



« f€W(P,L) 



Vv)- 
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Proof. For the estimation problem the risk supj gW /^g £ ) <p n Ef [\d n — d\] 

is bounded from below by the risk for two hypotheses, maxj = o,i ip~ l Ef i [\d n — 
di\], and then we directly use Lemma 4 from Butucea and Tsybakov [6]. 

For the testing problem, we choose two hypotheses, fo, the density under 
Ho, and another density f\ under H\ (which implies that ||/i — /0II2 > Cijj n , 
for some ip n > 0). Then the risk for the test problem becomes 

Rtest := inf sup (P Ho (A n = 1) + P Hl{c ^ n) (A n = 0)) 
A " feW(p,L) 



/ / dr 

> ^ n { P i( A n = !) + (!- VV)PJ (A n = 0, > 1 - 

This gives 

Rtest>(l-^j)Py{-£>l-^j 

fo 

1- u^L 



fo 

which allows one to conclude when assumption (b) holds. □ 

We shall use in the proofs the following construction. Let < 5 < 1 be 
small through the remaining proofs of lower bounds. 

In the estimation problem, let us choose fo, a density function in the 
Sobolev class W{(5, a(5)L), respectively, S(a,r,a(5)L), where < a(5) < 1 
is a constant depending on 5 and defined for each setup, such that (23) 
holds. Moreover, for the estimation problem we want to choose the Fourier 
transform $0 to have compact support included in (—25,26). 

In the testing problem, we have to assume that the density fo satisfies 
(23). 

Proof of (9) in Theorem 2 and of (7) in Theorem 5. This proof is 
based on a large family of hypotheses. Similar reasoning proves that the same 
construction is valid for proving lower bounds for both quadratic functional 
estimation and nonparametric testing in L2. 

Note that this setup includes Theorem 2 for /3 < <r+ 1/4. This is not a con- 
tradiction, since the lower bounds here are much slower than the parametric 
n -1 / 2 rate that the estimator attains; see Theorem 1. 

Let 8j, j = 1, . . . , M, be independent Bernoulli random variables and let II 
be the probability measure associated with them. For h > small as n — > 00 
and for a function H to be defined later, let 

M 

(25) f e {x) = fo(x) + 9jh^ +1 H h (x - Xj ), 

3=1 
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where Hh(-) = l/hH(-/h), xj = jh and M is an integer such that M/h = 
1 — o(l), as n — > 00 and for h small. Note that the observations YJ, i = 
l,...,n, when the underlying density is fg, have density pg(x) = po(x) + 
J2fLi8jh^ +C7+1 Gh(x — Xj), where the function G is defined in Lemma 2 
and H is such that $ G (u) = $> H (u)® 9 (u/h). Indeed, (H h (- - Xj) * g)(x) = 
Hh * g{x — Xj) = Gh{x — Xj). Using Lemmas 5 and 6 in the Appendix [4] we 
see that the hypotheses fit into the model, that is, fg are density functions 
for all 6 belonging to the Sobolev class W(0,L) and such that 



n[||/*- 

as n — ► 00, for fixed C > 0. 



/o|li>Cn- 4 ^+ 4CT+1 )]^l, 



Lemma 2. Let the function G : [—1,0] 



G{x) = exp 



1 



(4x + 3) 
1 



exp 



i be defined by 
; /[-i-l/ 2 ](x) 

I[-i/2fi\(x). 



1 - (4x + l) 2 , 

Then G is an infinitely differentiate function such that f G(x) dx = and 
having all polynomial moments finite. Its Fourier transform is such that 

|$ G (u)| <C G exp(-aJ\u~\) 



as u 



00 



for some positive constants Cq, a > 0. Moreover $ is an infinitely differ- 
entiate, bounded function. 

This construction is based on the function / a in Lepski and Levit [21], 
page 133, and the asymptotic behavior of its Fourier transform follows from 
the reference therein. 

We stress the fact that in this setup hypotheses functions fg belong to 
Hi(C,tp n ) with probability which tends to 1 when n — > 00. In order to 
bound the risk from below, very small modification is needed in the proof 
of Lemma 1 that we do not discuss in detail here. The last thing to check is 
that the distance between resulting models is finite, 

7nr=i^(^)vr(^)-nr=iPo(^) x2 



A* 




A I 



n i+Dv^ 

3=1 



-cr+l GhiXi 



i=l 



Po(Yi) 



1. 



Now, denote by Yij those observations Y{ belonging to the support of Gh(- — 
Xj) and aij = h^ +a+1 Gh(Yij — Xj)/po(Yij). Since those intervals are disjoint 
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we write 



A 2 = E 



fo 



n M 

nn(!+^ +CT+l 

i=ij=i 

i 



G h {Y K 



Po(Y k 



■K(d6,j) 



^/o ni 2 n( i +^-)+ o nc 1 -^-) ^ - 1 



i=l 



2 J 



i=l 



M ( 2 w 1 n 1 

<^/oIl olia+^o+olla- !) 

7=1 I j=l i=l J 



where we have used the facts that (a + b) 2 < 2a 2 + 2b 2 and that -E/ [aij] = 
since / G = 0. Moreover, are small with n and Ef [af j] < c/i 2 ^ +2fT+1 by 
Lemma 6 in the Appendix [4]. Therefore 



A < E 



in 



-j=l V «l^«2 / 



< l)E fo [a 2 d a 2 2 j] 



which is smaller than cMn 2 /! 4 ^" 1 " 2 < c'. 



□ 
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